AITopics | supervised word mover

Supervised Word Mover's Distance

Neural Information Processing SystemsNov-21-2025, 14:17:38 GMT

Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised WMD (S-WMD) metric. Our algorithm learns document distances that measure the underlying semantic differences between documents by leveraging semantic differences between individual words discovered during supervised training. This is achieved with an linear transformation of the underlying word embedding space and tailored word-specific weights, learned to minimize the stochastic leave-one-out nearest neighbor classification error on a per-document level. We evaluate our metric on eight real-world text classification tasks on which S-WMD consistently outperforms almost all of our 26 competitive baselines.

electronic proceedings, name change, supervised word mover, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.60)

Add feedback

Reviews: Supervised Word Mover's Distance

Neural Information Processing SystemsJun-1-2025, 20:29:35 GMT

Overall the paper reads like a nice combination of existing tricks, and provides very convincing experimental results. Strengths of the paper are simplicity and a relatively straightforward idea, but not trivial to implement/test. The experimental section is therefore a strong part of this paper. Things to improve: handle better the interplay between regularized/not regularized formulations, be more rigorous with maths (computations/notations are a bit sloppy) and ideally provide an algorithmic box to see more clearly into what the authors propose. A few minor comments: - In Eq.1, the Euclidean distance between word embeddings is used as a cost, in Eq.6, for the purpose of Malahanobis metric learning, that cost becomes the squared euclidean metric (and thus what is usually referred to as 2-Wasserstein).

distance, metric learning, supervised word mover, (1 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Boone County > Lebanon (0.07)
Asia > Middle East > Lebanon (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supervised Word Mover's Distance

Neural Information Processing SystemsMar-12-2024, 07:47:28 GMT

Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric.

dataset, representation, s-wmd, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Supervised Word Mover's Distance

Huang, Gao, Guo, Chuan, Kusner, Matt J., Sun, Yu, Sha, Fei, Weinberger, Kilian Q.

Neural Information Processing SystemsFeb-14-2020, 16:45:49 GMT

Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available.

document distance, semantic difference, supervised word mover, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.65)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.42)

Add feedback

Supervised Word Mover's Distance

Huang, Gao, Guo, Chuan, Kusner, Matt J., Sun, Yu, Sha, Fei, Weinberger, Kilian Q.

Neural Information Processing SystemsDec-31-2016

Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised WMD (S-WMD) metric. Our algorithm learns document distances that measure the underlying semantic differences between documents by leveraging semantic differences between individual words discovered during supervised training. This is achieved with an linear transformation of the underlying word embedding space and tailored word-specific weights, learned to minimize the stochastic leave-one-out nearest neighbor classification error on a per-document level. We evaluate our metric on eight real-world text classification tasks on which S-WMD consistently outperforms almost all of our 26 competitive baselines.

machine learning, natural language, text classification, (17 more...)

Neural Information Processing Systems

Country: